Automatically Augmenting Terminological Lexicons from Untagged Text

نویسندگان

George Demetriou

Robert J. Gaizauskas

چکیده

Lexical resources play a crucial role in language technology but lexical acquisition can often be a time-consuming, laborious and costly exercise. In this paper, we describe a method for the automatic acquisition of technical terminology from domain restricted texts without the need for sophisticated natural language processing tools, such as taggers or parsers, or text corpora annotated with labelled cases. The method is based on the idea of using prior or seed knowledge in order to discover co-occurrence patterns for the terms in the texts. A bootstrapping algorithm has been developed that identifies patterns and new terms in an iterative manner. Experiments with scientific journal abstracts in the biology domain indicate an accuracy rate for the extracted terms ranging from 58% to 71%. The new terms have been found useful for improving the coverage of a system used for terminology identification tasks in the biology domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences.

متن کامل

Syntactic Parsing as a Step for Automatically Augmenting Semantic Lexicons

This paper investigates how, and to what extent the flexibility and robustness of a partial parser can be utilized to automatically extend existing semantic lexicons. Our work is based on the observation that members of a semantic group are often surrounded by other members of the same group in text. Given a few category members we collect surrounding contexts and try to identify other words th...

متن کامل

Lexical Semantic Resources in a Terminological Network

A research has been carried on and is still in progress aimed at the construction of three specialized lexicons organized as databases of relational type. The three databases contain terms belonging to the specialized knowledge fields of maritime terminology (technicalnautical and maritime transport domain), taxation law, and labour law with union labour rules, respectively. The EuroWordNet/Ita...

متن کامل

Towards a Standardized Linguistic Annotation of the Textual Content of Labels in Knowledge Representation Systems

We propose applying standardized linguistic annotation to terms included in labels of knowledge representation schemes (taxonomies or ontologies), hypothesizing that this would help improving ontology-based semantic annotation of texts. We share the view that currently used methods for including lexical and terminological information in such hierarchical networks of concepts are not satisfactor...

متن کامل

Automatically Generating Extraction Patterns from Untagged Text

Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates dictionaries of extraction patterns u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Automatically Augmenting Terminological Lexicons from Untagged Text

نویسندگان

چکیده

منابع مشابه

Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

Syntactic Parsing as a Step for Automatically Augmenting Semantic Lexicons

Lexical Semantic Resources in a Terminological Network

Towards a Standardized Linguistic Annotation of the Textual Content of Labels in Knowledge Representation Systems

Automatically Generating Extraction Patterns from Untagged Text

عنوان ژورنال:

اشتراک گذاری